On universality in human correspondence activity 
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Identifying and modeling patterns of human activity has important ramifications 
in applications ranging from predicting disease spread to optimizing resource 
allocation. Because of its relevance and availability, written correspondence pro- 
vides a powerful proxy for studying human activity. One school of thought is that 
human correspondence is driven by responses to received correspondence, a view 
that requires distinct response mechanism to explain e-mail and letter correspon- 
dence observations. Here, we demonstrate that, like e-mail correspondence, the 
letter correspondence patterns of 16 writers, performers, politicians, and scien- 
tists are well-described by the circadian cycle, task repetition and changing com- 
munication needs. We confirm the universality of these mechanisms by properly 
rescaling letter and e-mail correspondence statistics to reveal their underlying 
similarity. 

Power-law statistics are a hallmark of critical phenomena. A less obvious characteristic of crit- 
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icality is the emergence of universality classes that capture the similarity of seemingly disparate 
systems. For example, despite the fact that water and carbon dioxide have different chemical prop- 
erties, they were observed to behave in the same manner close to their respective critical points (1). 
This is because idiosyncrasies, such as the existence of electric dipoles or the ability to form 
hydrogen bonds, become irrelevant near the liquid-gas critical point. For physical systems, renor- 
malization group theory (2, 3) has enabled researchers to understand the deep connection between 
the symmetries of a system and the mechanisms which underlie its behavior. The similarity of dif- 
ferent fluids near their respective liquid-gas critical points is often demonstrated by rescaling their 
statistics such that they collapse onto the same universal curves — oftentimes power-laws which 
have particular scaling exponents. By grouping different substances into the same "universality 
class," as identified by its scaling exponents, one uncovers that fluids are described by the same 
statistical laws near the liquid-gas critical point as uniaxial magnets are near their paramagnetic 
critical point (1). Importantly, one can also differentiate the behavior of these systems from the 
behavior of polymers near the sol-gel transition, which belong to a different universality class (1). 

In addition to critical phenomena, power-law scaling has also been widely reported in biol- 
ogy, economics, and sociology (4-10). Renormalization group theory therefore offers a tantalizing 
hypothesis for the prevalence of particular power-law scaling exponents in social systems: social 
systems, in analogy with physical systems, may operate near critical points and can therefore be 
classified into a small number of distinct universality classes. A heated debate has consequently 
ensued in the literature concerning the "universality of human systems" (in the statistical physics 
meaning of the phrase). Is there enough statistical evidence for the asymptotic power-law de- 
scription of the heavy-tailed distributions reported in human systems (11-14)1 Is it reasonable 
to postulate that social systems, like their physical counterparts (2, 3, 15), can be classified into 
universality classes according to scaling exponents (16)1 

Human correspondence is a paradigmatic area where the matter of power-law scaling and uni- 
versality are contentious issues. One view that has recently received significant attention in the 
literature (17, 18) posits that correspondence patterns are driven primarily by the need to respond 
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to other individuals. This is formalized by a priority queuing model (19) which, under certain 
limiting conditions, reproduces the asymptotic scaling of empirically observed heavy-tailed corre- 
spondence statistics. In particular, the heavy-tailed statistical properties of e-mail correspondence 
are reportedly reproduced by a fixed-length queue with a single task type (19, 20) whereas the 
heavy-tailed statistical properties of letter correspondence are reportedly reproduced by either a 
variable-length queue with a single task type (20, 21) or by a fixed-length queue with multiple task 
types (22). The fact that there are different exponents for the two modes of correspondence has 
been taken as evidence that human correspondence falls into one of two universality classes (20). 
When interpreted in the statistical mechanics sense of "universality," one would conclude that e- 
mail and letter correspondence are fundamentally different activities. 

In contrast, we hypothesize that human correspondence patterns are not driven by responses 
to others but by more prosaic mechanisms — circadian cycles, task repetition and changing com- 
munication needs. We formalize these mechanisms with a cascading non-homogeneous Poisson 
process, which we have previously shown to be statistically consistent with e-mail communication 
patterns (14). Here, we hypothesize that the same model is capable of describing letter correspon- 
dence and that the heavy-tailed correspondence statistics primarily arises from the variation in an 
individual's communication needs over the course of their lifetime. 

We obtained the letter correspondence records for 16 writers, performers, politicians, and sci- 
entists. Each data set consists of a list of letters that were sent by each of these individuals, and 
each record comprises the name of the sender, the name of the recipient, and the date when it 
was written (see Sec. SI for details). The nature of the data raises two issues to consider during 
analysis. First, the precise authorship date of some letters is unknown, so we restrict our analysis 
to only those letters that have precise authorship dates. Second, it is highly unlikely that all of the 
letters written by a particular individual are present in the database. We have confirmed that our 
results are insensitive to sampling effects from this method of data collection (Sec. S2). 

An important consideration in studying the letter correspondence patterns of these individu- 
als is that the data covers their entire lifetimes. As a result, it is quite conceivable that changing 
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communication needs might affect letter correspondence patterns. For example, before Einstein 
became widely known, the bulk of his recorded communication was to friends and relatives. After 
the confirmation of his theory of relativity in 1919, Einstein's need to communicate with other indi- 
viduals substantially increased. By that time, his step-daughter Use Einstein was helping him with 
secretarial tasks resulting in greatly improved coverage of his recorded correspondence (23). Due 
to this secretarial assistance and his increased fame, we expect that the average time between con- 
secutively sent letters, the average inter-event time (r), is significantly larger during the beginning 
of Einstein's life than during the latter part of his life. Our expectations are verified in Fig. [IK-B, 
demonstrating that these time series' are non- stationary — that is, the heavy-tailed inter-event time 
distribution results from a mixture of time scales (24). 

Since these time series' are non-stationary, we partitioned each complete time series into 
smaller time segments so that we can make the approximation that the behavior within each time 
segment is stationary. We accomplish this by splitting the time series into segments lasting 364 
days (52 weeks) unless fewer than 10 events fall within that time period, in which case consecutive 
segments are merged until this criterion is met. 

Assuming that the correspondence patterns within each time segment are stationary, we can 
then model the behavior within each time segment with standard techniques. As a first approxima- 
tion, one might naively expect that letters are sent at a constant rate p and that the time at which 
every letter is sent is independent of all others. Such a process is referred to as a homogeneous 
Poisson process, which gives rise to an exponential inter-event time distribution p (r) = pe~ pT . 
While the tail of the inter-event time distribution within these time segments is approximately ex- 
ponential, the best-estimate predictions of a homogeneous Poisson process does not produce the 
correct decay rate (Fig. \TP). This suggests that only a few changes to the homogeneous Poisson 
process are needed to statistically reproduce the observed inter-event time distribution. We hy- 
pothesize that, like e-mail correspondence, two additional ingredients must also be considered for 
modeling letter correspondence (14). 

First, daily and weekly cycles of activity may influence when individuals communicate. Previ- 
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ously, we accounted for these cycles of activity in e-mail communication with a non-homogeneous 
Poisson process whose rate p(t) changes periodically on daily and weekly time scales. For letter 
correspondence, however, the resolution of the data does not permit us to identify activity pat- 
terns within a day, and day-to-day changes in activity provide no additional insight (Sec. S3). We 
therefore approximate the non-homogeneous Poisson process defined by pit) by a homogeneous 
Poisson process with constant rate p^ during time segment i; that is, we model the rate of activity 
p(t) throughout each individual's life by a piecewise constant function of time. 

Second, individuals are much more likely to continue writing letters once they have written one 
letter in order to use their time more effectively. We account for this behavior by hypothesizing 
that, once an individual finishes writing a letter, there is a probability that they write another 
letter. This process repeats itself until this cascade of additional letters concludes with probability 
1 — £j, at which point the individual's behavior is again governed by a homogeneous Poisson 
process with rate pi (25). We refer to the resulting model as a cascading Poisson process. 

To compare the predictions of the cascading Poisson process (26) to the empirical data, we 
must first estimate the parameters Q L = {pi,^} from the data during each time segment. The 
nature of the data, however, raises an important concern for parameter estimation: since each event 
is only known to occur within a particular day, not at a precise time of the day, the data are interval 
censored (27). We account for the interval censored data and calculate the best-estimate parameters 
6i by numerically maximizing the censored likelihood function (see Sec. S4 for the derivation). 

The resulting best-estimate parameters Oi provide insight into the correspondence patterns of 
each individual (Fig. [2K-B and Fig. S4). For example, while both Schoenberg and Einstein have 
a 50-fold increase in the rate at which the send letters — presumably due to their increasing corre- 
spondence obligations and a more complete sampling of their overall letter correspondence — their 
utilization of cascades of activity is markedly different. Schoenberg, for instance, sends about 21% 
of his letters during cascades of multiple letters throughout his life. In contrast, Einstein rarely uti- 
lizes cascades of activity as a young man (before 1910) whereas in later years (after 1933) he sends 
approximately 34% of his letters during cascades of multiple letters. 
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In the period 1928-1933, Einstein sent over 50% of his letters during cascades of multiple 
letters. The start of this period coincides with the hiring of Einstein's long-time secretary Helen 
Dukas, who more systematically retained copies of his outgoing correspondence. After the Nazis 
took over power in January 1933, his correspondence patterns change markedly; this possibly 
reflects changes in his correspondence obligations at Princeton University after immigrating to the 
United States in late 1933 (23). 

Of course, inferring how an individual's behavior changes based on a model's parameter es- 
timates is contingent upon the model being consistent with the data. We tested the statistical 
consistency of our model with the data by Monte Carlo hypothesis testing (Sec. S5). We reject the 
model during a particular time segment if the p-value obtained from the Monte Carlo hypothesis 
testing procedure is less than a threshold of 0.05. Because this threshold is greater than zero, it 
means that there is a finite chance that we will reject the hypothesis that the model is consistent 
with the data even if the data was generated from the model. 

If we assume that each time segment is independent, then we would expect to reject each of 
the time segments with a 5% chance and the total number of rejections to be distributed according 
to a binomial model (28). Out of the 54 independent time segments for Einstein for example, we 
would expect to reject the model 2.7 times with 0-6 defining the bounds of the 95% confidence 
interval of the corresponding binomial model. For Einstein, our procedure "rejects" the cascading 
Poisson process for 2 out of 54 time segments, indicating that we cannot reject the hypothesis 
that the model is able to explain his correspondence patterns. Indeed, our hypothesis testing con- 
firms that the cascading Poisson process can not be rejected as an explanatory model for the letter 
correspondence of any of the individuals under consideration (Tbl.Q]). These results demonstrate 
that the origin of the heavy-tailed inter-event time distribution is a mixture of distributions with 
different time scales (Fig.[2]C-E). 

Our findings enable us to address a crucial question: do e-mail and letter correspondence be- 
long to different universality classes (20)1 Since the same mechanistic model is capable of de- 
scribing both e-mail and letter correspondence, we can answer this question in the negative. We 



6 



demonstrate the underlying similarity of both correspondence activities by rescaling and collapsing 
the inter-event time distributions for 16 randomly selected e-mail correspondents (29) for which 
we have model parameter estimates (14) and the 16 letter correspondents studied here (Fig. [3]). The 
rescaled inter-event time distributions agree with theoretical expectations (30), demonstrating that 
the same exponential statistical law is indeed capable of describing both correspondence patterns. 

Only by understanding and validating the underlying mechanisms can we appropriately rescale 
e-mail and letter correspondence to reveal their underlying similarity. Unlike critical phenomena, 
the universality here does not arise from the irrelevance of idiosyncrasies but rather from the fact 
that these two different modes of communication are governed by the same mechanisms. This 
insight is not apparent just by studying the asymptotic scaling of an empirical distribution obtained 
from an individual; one simply cannot infer that different "scaling" exponents necessarily imply 
different mechanisms. 

Our results therefore raise significant questions about the nature of universality in complex 
phenomena, in general, and in human correspondence, in particular. Perhaps the most common 
universal statistical law is due to the central limit theorem — sums of variates with finite fluc- 
tuations converge to a Gaussian distribution. When confronted with statistical patterns that are 
non-Gaussian one is tempted to surmise that the system's fluctuations are not finite. In analogy 
to physical systems, the recurrence of power-law dependencies with similar exponent values in 
biological or social systems is frequently hypothesized to arise from the fact that these systems 
operate near critical points where particular details of the system become irrelevant. 

A less-explored hypothesis, as exemplified here, is that heavy-tailed distributions emerge as a 
result of non-stationarities in the absence of criticality (14, 31). Our study demonstrates that human 
correspondence can be accurately modeled as a cascading non-homogeneous Poisson process — a 
non-critical process. This process gives rise to heavy-tailed statistics but not to power-law statis- 
tics characterized by critical exponents. Instead, the correspondence patterns of each individual 
are uniquely characterized by the parameters of our model (32); the process is universal, but the 
parameters are not. 
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Indeed, we postulate that the cascading Poisson process, which formally incorporates the cir- 
cadian cycle, task repetition and changing needs, may accurately describe many other aspects of 
human activity. The circadian cycle has such physiologic impact that it is natural to surmise that it 
will affect numerous human behaviors, from eating habits to commuting routines. Task repetition 
is similarly important due to the increased efficiency it enables; once an individual makes one pur- 
chase at a mall, it is easier to make other purchases within that mall during the same trip than it is 
to return to the mall the following day. As one ages and changes roles, it is not hard to imagine that 
the extent with which the circadian cycle and task repetition influence their activity might change 
over time. It is therefore plausible that the cascading Poisson processes presented here could be 
generalized to account for different types of activities, each with its own evolving parameters. 
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Tbl. 1 : Summary of the letter correspondence records and hypothesis testing results for the 1 6 indi- 
viduals under consideration, ordered chronologically. For each individual, we note the time period 
and duration of the letter correspondence records, the total number of letters sent, the number of 
time segments with at least 10 letters per time segment, the 95% confidence interval (CI) bounds 
of the corresponding binomial model with p = 0.05, and the number of rejections of the cascading 
Poisson process based on our Monte Carlo hypothesis testing procedure. The number of Monte 
Carlo hypothesis testing rejections is within the 95% confidence interval bounds for all 16 indi- 
viduals, indicating that this model can not be rejected for any individual's letter correspondence 
patterns. We have conducted the same analysis for three alternative models; we find that a cascad- 
ing Poisson process provides the most parsimonious and statistically consistent explanation of the 



data (Sec. S3). 





Time 


Duration 


Number of 


Number of 




Number of 


Individual 


Period 


(yr) 


letters 


segments 


95% CI 


rejections 


Francis Bacon 


1574- 


-1626 


53 


443 


19 


[0,3] 


3 


James H. Leigh Hunt 


1790- 


-1859 


70 


408 


25 


[0,3] 


1 


Charles Darwin 


1822- 


-1882 


61 


6,785 


52 


[0,5] 


4 


Anna Brownell Jameson 


1833- 


-1860 


28 


119 


8 


[0,2] 


1 


Friedrich Engels 


1833- 


-1895 


63 


369 


24 


[0,3] 


1 


Robert E. Lee 


1835- 


-1870 


36 


282 


10 


[0,2] 





Karl Marx 


1837- 


-1882 


46 


469 


25 


[0,3] 


1 


Henry Irving 


1852- 


-1905 


54 


1,205 


35 


[0,4] 





Sigmund Freud 


1872- 


-1939 


68 


3,130 


49 


[0,5] 


2 


Marcel Proust 


1879- 


-1922 


44 


668 


25 


[0,3] 


2 


H. G. Wells 


1895- 


-1946 


52 


422 


16 


[0,2] 





Albert Einstein 


1896- 


-1955 


60 


10,319 


54 


[0,6] 


2 


Carl Sandburg 


1898- 


-1966 


69 


1,894 


37 


[0,4] 


2 


Arnold Schoenberg 


1902- 


-1951 


50 


6,899 


47 


[0,5] 


3 


Ernest Hemingway 


1909- 


-1961 


53 


1,934 


42 


[0,5] 


5 


Stan Laurel 


1924- 


-1964 


41 


685 


17 


[0,3] 


1 
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Inter-event time, x (d) 




Inter-event time, x (d) 



Fig. 1: 



13 



Fig. [Q Non-stationarity of Albert Einstein's letter correspondence activity. While we select Ein- 
stein as an example, non-stationarities are present for all 16 writers, performers, politicians, and 
scientists studied here. A, Running average inter-event time (r) averaged over 100 consecutive 
inter-event times. During the beginning of Einstein's life (blue shaded region), the average inter- 
event time is significantly larger than during the end of his life (orange shaded region). B, Logarith- 
mically binned probability density of the non-zero inter-event times r. If we separately consider 
the inter-event time distribution during each portion of Einstein's life, it is clear that the complete 
inter-event time distribution (black line) is actually a mixture of behaviors. To emphasize the ori- 
gins of the heavy-tailed distribution, the probability densities of each portion of Einstein's life are 
normalized such that their integrals are equal to the fraction of non-zero inter-event times during 
that time. C, Comparison of the empirical inter-event time distribution during a particular time 
segment with the simulated predictions of the best-estimate homogeneous Poisson process that 
is interval censored in the same manner as the data. It is visually apparent that a homogeneous 
Poisson process is not consistent with the empirical data, which is confirmed by Monte Carlo 
hypothesis testing (Sec. S3). 
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Fig. O Origin of heavy-tailed inter-event time distribution for Albert Einstein. While we select 
Einstein as an example, the same explanation is relevant for all 16 writers, performers, politicians, 
and scientists studied here. A-B, We estimate the parameters 0; = by maximizing the 

censored likelihood function for each time segment (Sec. S4). Grey shaded regions denote time 
segments during which the cascading Poisson process is rejected by Monte Carlo hypothesis test- 
ing. Parameter estimates for all individuals under consideration can be found in Fig. S4. Note 
the 50-fold changes in the rate pi and the dramatic changes in £j for Einstein. C-D, The cumula- 
tive distribution of inter-event times for Einstein during particular time segments compared with 
the predictions of a non-stationary cascading Poisson process with the best-estimate parameters 
(A-B). The model predictions are generated numerically by running the model defined by 6{t) 
ten-times and interval censoring the resulting synthetic time series in the same manner as the em- 
pirical data. C, The cumulative distribution of inter-event times (circles) for Einstein over his entire 
life compared with the predictions of a non-stationary cascading Poisson process (red line) with 
the best-estimate parameters (A-B). When the mixture of behaviors are taken into account, the 
origin of the heavy-tailed inter-event time distribution is clear. The inter-event time distributions 
for all 16 letter correspondents under consideration can be found in Fig. S5. 
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Rescaled inter-event time 

Fig. 3: Collapse of inter-event time distributions for letter and e-mail correspondence. A, Cu- 
mulative distribution of inter-event times for all 16 letter correspondents (red lines) and 16 ran- 
domly selected e-mail correspondents (blue lines). B, Cumulative distribution of rescaled inter- 
event times on logarithmic and linear (inset) axes. The inter-event time r k = t k +i — t k is 
rescaled by the average inter-event time expected during the interval [t k ,t k+ i], which is given 
by (r) = (ijfc+i — tk)/ f t k+1 p(s)ds. By the time rescaling theorem (30), the resulting rescaled 
inter-event time distribution is given by the expected inter-event time distribution for a homoge- 
neous Poisson process with unit rate P(r) = e~ T (black dashed line). We only consider inter-event 
times r > for letter correspondence. 
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